{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Run Model Bias Analysis with SageMaker Clarify (Post-Training)\n",
    "\n",
    "## Using SageMaker Processing Jobs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import sagemaker\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "sess   = sagemaker.Session()\n",
    "bucket = sess.default_bucket()\n",
    "role = sagemaker.get_execution_role()\n",
    "region = boto3.Session().region_name\n",
    "\n",
    "sm = boto3.Session().client(service_name='sagemaker', region_name=region)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r training_job_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    training_job_name\n",
    "    print('[OK]')\n",
    "except NameError:\n",
    "    print('+++++++++++++++++++++++++++++++')\n",
    "    print('[ERROR] Please run the notebooks in the previous TRAIN section before you continue.')\n",
    "    print('+++++++++++++++++++++++++++++++')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(training_job_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Review Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "data = pd.read_json('data-clarify/test_bias.jsonl', lines=True)\n",
    "data.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "\n",
    "inference_image_uri = sagemaker.image_uris.retrieve(\n",
    "    framework=\"tensorflow\",\n",
    "    region=region,\n",
    "    version=\"2.3.1\",\n",
    "    py_version=\"py37\",\n",
    "    instance_type='ml.m5.4xlarge',\n",
    "    image_scope=\"inference\"\n",
    ")\n",
    "print(inference_image_uri)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_name = sess.create_model_from_job(\n",
    "    training_job_name=training_job_name,\n",
    "    image_uri=inference_image_uri\n",
    ")\n",
    "print(model_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Detecting Bias with Amazon SageMaker Clarify\n",
    "\n",
    "SageMaker Clarify helps you detect possible pre- and post-training biases using a variety of metrics."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## We need use a patch image to support JSONLines correctly for now\n",
    "                         \n",
    "```clarify_processor = clarify.SageMakerClarifyProcessor(...\n",
    "        clarify_processor.image_uri = xxx```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker import clarify\n",
    "\n",
    "clarify_processor = clarify.SageMakerClarifyProcessor(role=role,\n",
    "                                                      instance_count=1,\n",
    "                                                      instance_type='ml.c5.2xlarge',\n",
    "                                                      sagemaker_session=sess)\n",
    "\n",
    "# patch image \n",
    "clarify_processor.image_uri = \"678264136642.dkr.ecr.us-east-1.amazonaws.com/sagemaker-clarify-processing:1.0_jsonlines_patch\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Post-training Bias\n",
    "Computing post-training bias metrics does require a trained model.\n",
    "\n",
    "Unbiased training data (as determined by concepts of fairness measured by bias metric) may still result in biased model predictions after training. Whether this occurs depends on several factors including hyperparameter choices.\n",
    "\n",
    "\n",
    "You can run these options separately with `run_pre_training_bias()` and `run_post_training_bias()` or at the same time with `run_bias()` as shown below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Writing DataConfig and ModelConfig\n",
    "A `DataConfig` object communicates some basic information about data I/O to Clarify. We specify where to find the input dataset, where to store the output, the target column (`label`), the header names, and the dataset type.\n",
    "\n",
    "Similarly, the `ModelConfig` object communicates information about your trained model and `ModelPredictedLabelConfig` provides information on the format of your predictions.  \n",
    "\n",
    "**Note**: To avoid additional traffic to your production models, SageMaker Clarify sets up and tears down a dedicated endpoint when processing. `ModelConfig` specifies your preferred instance type and instance count used to run your model on during Clarify's processing."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Upload Data To S3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "path = './data-clarify/test_bias.jsonl'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!head -n 5 $path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_bias_path_jsonlines_s3_uri = sess.upload_data(bucket=bucket, key_prefix=training_job_name, path=path)\n",
    "test_bias_path_jsonlines_s3_uri"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store test_bias_path_jsonlines_s3_uri"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Configure Clarify"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bias_report_output_path = 's3://{}/{}/clarify-bias'.format(bucket, training_job_name)\n",
    "\n",
    "data_config = clarify.DataConfig(s3_data_input_path=test_bias_path_jsonlines_s3_uri,\n",
    "                                 s3_output_path=bias_report_output_path,\n",
    "                                 label='star_rating',\n",
    "                                 features='features',\n",
    "                                 # label must be last, features in exact order as passed into model\n",
    "                                 headers=['review_body', 'product_category', 'star_rating'],\n",
    "                                 dataset_type='application/jsonlines')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_config = clarify.ModelConfig(model_name=model_name,\n",
    "                                   instance_type='ml.m5.4xlarge',\n",
    "                                   instance_count=1,\n",
    "                                   content_type='application/jsonlines',\n",
    "                                   accept_type='application/jsonlines',\n",
    "                                   # {\"features\": [\"the worst\", \"Digital_Software\"]}\n",
    "                                   content_template='{\"features\":$features}'\n",
    "                                  )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## _Note: `label` is set to the JSON key for the model prediction results_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictions_config = clarify.ModelPredictedLabelConfig(label='predicted_label')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Run Clarify"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bias_config = clarify.BiasConfig(label_values_or_threshold=[5,4], # needs to be int or str for continuous dtype, needs to be >1 for categorical dtype\n",
    "                                facet_name='product_category',\n",
    "#                                facet_values_or_threshold=['Gift Card'],\n",
    "                                group_name='product_category')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clarify_processor.run_post_training_bias(\n",
    "    data_config=data_config,\n",
    "    data_bias_config=bias_config,\n",
    "    model_config=model_config,\n",
    "    model_predicted_label_config=predictions_config,\n",
    "#    methods='all', # FlipTest requires all columns to be numeric\n",
    "    methods=['DPPL', 'DI', 'DCA', 'DCR', 'RD', 'DAR', 'DRR', 'AD', 'CDDPL', 'TE'],\n",
    "    wait=False,\n",
    "    logs=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "run_post_training_bias_processing_job_name = clarify_processor.latest_job.job_name\n",
    "run_post_training_bias_processing_job_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(HTML('<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/processing-jobs/{}\">Processing Job</a></b>'.format(region, run_post_training_bias_processing_job_name)))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(HTML('<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/cloudwatch/home?region={}#logStream:group=/aws/sagemaker/ProcessingJobs;prefix={};streamFilter=typeLogStreamPrefix\">CloudWatch Logs</a> After About 5 Minutes</b>'.format(region, run_post_training_bias_processing_job_name)))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(HTML('<b>Review <a target=\"blank\" href=\"https://s3.console.aws.amazon.com/s3/buckets/{}/{}/?region={}&tab=overview\">S3 Output Data</a> After The Processing Job Has Completed</b>'.format(bucket, run_post_training_bias_processing_job_name, region)))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "running_processor = sagemaker.processing.ProcessingJob.from_processing_name(processing_job_name=run_post_training_bias_processing_job_name,\n",
    "                                                                            sagemaker_session=sess)\n",
    "\n",
    "processing_job_description = running_processor.describe()\n",
    "\n",
    "print(processing_job_description)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "running_processor.wait(logs=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Viewing the Bias Report\n",
    "In Studio, you can view the results under the experiments tab.\n",
    "\n",
    "<img src=\"img/bias_report.gif\">\n",
    "\n",
    "Each bias metric has detailed explanations with examples that you can explore.\n",
    "\n",
    "<img src=\"img/bias_detail.gif\">\n",
    "\n",
    "You could also summarize the results in a handy table!\n",
    "\n",
    "<img src=\"img/bias_report_chart.gif\">\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you're not a Studio user yet, you can access the bias report in pdf, html and ipynb formats in the following S3 bucket:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Download Report From S3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 ls $bias_report_output_path/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 cp --recursive $bias_report_output_path ./generated_bias_report/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(HTML('<b>Review <a target=\"blank\" href=\"./generated_bias_report/report.html\">Bias Report</a></b>'))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Release Resources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%html\n",
    "\n",
    "<p><b>Shutting down your kernel for this notebook to release resources.</b></p>\n",
    "<button class=\"sm-command-button\" data-commandlinker-command=\"kernelmenu:shutdown\" style=\"display:none;\">Shutdown Kernel</button>\n",
    "        \n",
    "<script>\n",
    "try {\n",
    "    els = document.getElementsByClassName(\"sm-command-button\");\n",
    "    els[0].click();\n",
    "}\n",
    "catch(err) {\n",
    "    // NoOp\n",
    "}    \n",
    "</script>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%javascript\n",
    "\n",
    "try {\n",
    "    Jupyter.notebook.save_checkpoint();\n",
    "    Jupyter.notebook.session.delete();\n",
    "}\n",
    "catch(err) {\n",
    "    // NoOp\n",
    "}"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
